Semantics for Music Researchers: How Country is my Country?
نویسندگان
چکیده
The Linking Open Data cloud contains several music related datasets that hold great potential for enhancing the process of research in the eld of Music Information Retrieval (MIR) and which, in turn, can be enriched by MIR results. We demonstrate a system with several related aims: to enable MIR researchers to utilise these datasets through incorporation in their research systems and work ows; to publish MIR research output on the Semantic Web linked to existing datasets (thereby also increasing the size and applicability of the datasets for use in MIR); and to present MIR research output, with cross-referencing to other linked data sources, for manipulation and evaluation by researchers and re-use within the wider Semantic Web. By way of example we gather and publish RDF describing signal collections derived from the country of an artist. Genre analysis over these collections and integration of collection and result metadata enables us to ask: "how country is my country?". 1 Background and motivation Much of the work of researchers in the eld of Music Information Retrieval (MIR) focusses on the algorithmic extraction of information from music. However, there are many problems associated with the design and implementation of distributed systems within which such algorithms might be deployed. We can broadly describe the process an MIR researcher typically follows in three steps; we also highlight some of the issues and, at an abstract level, how linked data and semantic web technologies might assist in building a complete system. 1. Assemble a collection of audio input. To evaluate an algorithm, the researcher must acquire a wide selection of signal typically digital audio les for the algorithm to process. Music recordings are often restricted from free exchange amongst researchers, either explicitly through copyrights or implicitly through the high overheads of managing detailed and intricate licensing. Even when audio data is freely available and distributable a di cult balance must be found to avoid overtting of algorithms to a particular set of signals: whilst a widely shared, understood, and re-usable collection is critical for comparative evaluation, tuning an algorithm to such a collection during development (knowing it will be the benchmark) is likely to detrimentally a ect performance against more randomly selected input (i.e. real-world tests). It is therefore useful to create and modify large collections of audio data quickly and exibly which can be shared between researchers for comparative evaluation. Restrictions on the distribution of actual audio les can be accommodated through the separate description of collections and correctly modelling the relationship between artefacts (e.g. distinguishing between a work, a performances of the work, recordings of the performance, and published media of the recording); metadata exchange can then occur independently and be cross-referenced against any institutional or other private archive of audio. Linking existing metadata for audio les and basing collection generation on this information is desirable for quickly trialling an algorithm against particular musical facets (e.g. a particular period and style derived from the composers). 2. Apply the algorithm to the audio input. There are many MIR systems which enable an algorithm to be applied to signal. More recently some systems have begun to adopt practices and tools from the scienti c work ow community, for example the Meandre work ow enactment system [1]. Any such system must be able to recognise an input collection and apply the algorithm across it. Where institutionally restricted collections of signal are in use a system must match local audio les to any abstract, metadata based, collection descriptions. 3. Publish and evaluate algorithm output. The MIR community has a 7 year history of comparative evaluation in the MIREX competition; the most recent (2010) MIREX adopted a Meandre derived framework for executing the algorithms under test [2]. More generally, evaluation of results requires a common structure into which analytic output can be published for comparison, rather than data structures inherited from the development tool or environment a researcher was using. As faster computational resources become more readily available and can be applied to MIR tasks, the opportunity to undertake analysis on an ever greater scale brings with it the associated problems of managing ever greater quantities of result data. Links from results back to recorded signal (and audio le artefacts) and capturing provenance are equally important: an single algorithm is not normally su cient to make a de nitive assertion, e.g. to classify a recording as jazz. For this reason it is important that the representation of results can be used as input for creating derivative collections of input for further MIR analysis such that information extracted from multiple algorithms can be combined and re ned. 2 System overview and Country/country example Employing new RDF encodings for collections and results that utilise existing ontologies (including the Music Ontology, GeoNames, Provenance Vocabulary, and OAI-ORE), and by deploying a linked data audio le repository and services for publishing collections and results, we present a proof-of-concept system that addresses the problems outlined in the previous section. While the principles and design described here can be applied to all MIR systems, for demonstration purposes we have developed a speci c use case known as Country/country . In this section we outline the components of the system, which approximately align to the steps in the previous section (with the addition of a pre-step), detailing the generic purpose of each service, followed by the speci c implementation in Country/country (in italics). 0. An Audio File Repository which serves audio les and linked data about the audio les using HTTP. For our public demonstrator a subset copy of the free-licensed Jamendo collection has been used. Using the Music Ontology[3], the relationship to the track it is a recording of, and the de nitive URI for that track (as minted by the Jamendo linked data service at dbtune) is asserted in the linked data. 1. A Collection Builder web application that enables a user to publish sets of tracks described using RDF. The backend uses SPARQL to build collections and takes advantage of links between datasets: e.g. the Jamendo service incorporates links to geographic locations as de ned by GeoNames, so the Collection Builder can identify all the tracks o ered by Jamendo recorded by artists from a speci c country. An optional second stage of collection builder takes a collection and grounds the constituent tracks against available recordings of those tracks by posing SPARQL queries to Audio File Repositories. In the case of Country/country we ground a country derived collection against our Audio File Repository of locally available signal. 2. The Analysis is performed by a NEMA[2] genre classi cation work ow: We have extended the myExperiment[4] scienti c collaborative environment to support the Meandre[1] work ows used by NEMA. myExperiment has also been modi ed to accept the collections RDF published in step 1) and marshal the target tracks contained within to the analysis work ow. Within the (Meandre-based) genre classi cation work ow a head-end component has been written to dereference each track URI passed to the work ow and, using the linked data published by the signal repository, retrieve both the local copy of the audio le and the reference to the original Jamendo identi er. This URI persists through the genre analysis work ow until it reaches a new tail-end component where the analysis is published using RDF including links back to the Jamendo URI. 1 http://www.jamendo.com/ 2 http://dbtune.org/jamendo/ 3 http://www.geonames.org/ontology/ 3. A Results Viewer web application retrieves the collections RDF from 1) and results RDF from 2), cross-referencing them via the URIs used throughout the system. The user can identify trends in genre classi cation within and between collections. Results can be pooled and compared using existing and new collections and inform the creation of new sets. To demonstrate how further links can easily be made to existing datasets and inform derivative collection generation, relevant associations from other linked data sets are shown (e.g. artists of the same genre and country from DBpedia and the BBC for a particular analysed track). 3 Online demonstrator The Country/country demonstrator system is available at: http://www.nema.ecs.soton.ac.uk/countrycountry/
منابع مشابه
Celebrating life stories through music.
Music has always been a meaningful part of my life. I have been studying and performing since the age of five, but it was not until I was an adult, when my father was approaching the end of his life, that I realized just how profound the connection to music can be. While my father’s health was in decline, our strongest connections came from listening to music together. Listening to music that h...
متن کاملA West Country Composer
so music must be a favourite, even therapeutic relaxation for many doctors. While immersed in the almost decadent high romanticism of Richard Strauss's music I thought of the music of one who has been described as "almost the only romantic composer left on these shores", and whose most recently premiered work had also been performed by the Welsh National Opera Orchestra and Chorus last November...
متن کاملGenre of Music and Lyrical Content: Expectation Effects.
This study was designed to examine whether people's expectations differ regarding how music lyrics affect individual behavior as a function of music genre. Because legislative attention and media publicity have been biased against certain types of popular music (i.e., heavy metal and rap), the authors expected that those genres of music would be viewed more negatively than other genres of popul...
متن کاملQuantifying Music Trends and Facts Using Editorial Metadata from the Discogs Database
While a vast amount of editorial metadata is being actively gathered and used by music collectors and enthusiasts, it is often neglected by music information retrieval and musicology researchers. In this paper we propose to explore Discogs, one of the largest databases of such data available in the public domain. Our main goal is to show how largescale analysis of its editorial metadata can rai...
متن کاملPredicting Genre Preferences from Cultural and Socio-Economic Factors for Music Retrieval
In absence of individual user information, knowledge about larger user groups (e.g., country characteristics) can be exploited for deriving user preferences in order to provide recommendations to users. In this short paper, we study how to mitigate the cold-start problem on a country level for music retrieval. Specifically, we investigate a large-scale dataset on user listening behavior and sho...
متن کامل